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Abstract — In this paper, we consider the problem of designing 
codewords for DNA storage systems and DNA computers that 
are unlikely to fold back onto themselves to form undesirable 
secondary structures. Secondary structure formation causes a 
DNA codeword to become less active chemically, thus rendering 
it useless for the purpose of DNA computing. It also defeats 
the read-back mechanism in a DNA storage system, so that 
information stored in such a folded DNA codeword cannot 
be retrieved. Based on some simple properties of a dynamic- 
programming algorithm, known as Nussinov's method, which is 
an effective predictor of secondary structure given the sequence 
of bases in a DNA codeword, we identify some design criteria 
that reduce the possibility of secondary structure formation in 
a codeword. These design criteria can be formulated in terms 
of the requirement that the Watson-Crick distance between a 
DNA codeword and a number of its shifts be larger than a given 
threshold. This paper addresses both the issue of enumerating 
DNA sequences with such properties and the problem of practical 
DNA code construction. 

I. Introduction 

The last century was marked by the birth of two major 
scientific and engineering disciplines: silicon-based computing 
and the theory and technology of genetic data analysis. The 
research field very likely to dominate the area of scientific 
computing in the foreseeable future is the merger of these two 
disciplines, leading to unprecedented possibilities for applica- 
tions in varied areas of engineering and science. The first steps 
toward this goal were made in 1994, when Leonard Adleman 

[I] solved a quite unremarkable computational problem, an 
instance of the directed travelling salesmen problem on a 
graph with seven nodes, with an exceptional method. The 
technique used for solving the problem was a new techno- 
logical paradigm, termed DNA computing. DNA computing 
introduced the possibility of using genetic data to tackle 
computationally hard classes of problems that are otherwise 
impossible to solve using traditional computing methods. The 
way in which DNA computers make it possible to achieve this 
goal is through massive parallelism of operation on nano-scale, 
low-power, molecular hardware and software systems. 

One of the major obstacles to efficient DNA computing, 
and more generally DNA storage [7] and signal processing 

[II] , is the very low reliability of single-stranded DNA se- 
quence operations. DNA computing experiments require the 
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creation of a controlled environment that allows for a set of 
single-stranded DNA codewords to bind (hybridize) with their 
complements in an appropriate fashion. If the codewords are 
not carefully chosen, unwanted, or non-selective, hybridization 
may occur. Even more detrimental is the fact that a single- 
stranded DNA sequence may fold back onto itself, forming a 
secondary structure which completely inhibits the sequence 
from participating in the computational process. Secondary 
structure formation is also a major bottleneck in DNA storage 
systems. For example, it was reported in [7] that 30% of 
read-out attempts in a DNA storage system failed due to the 
formation of special secondary structures called hairpins in 
the single-stranded DNA molecules used to store information. 

So far, the focus of coding for DNA computing was 
exclusively directed towards constructing large sets of DNA 
codewords with fixed base frequencies (constant GC-content) 
and prescribed Hamming/reverse-complement Hamming dis- 
tance. Such sets of codewords are expected to result in very 
rare hybridization error events. As an example, it was shown 
in [3] that there exist 94595072 codewords of length 20 with 
minimum Hamming distance d = 5 and with exactly 10 G/C 
bases. At the same time, the Wisconsin DNA Group, led by D. 
Shoemaker, reported that through extensive computer search 
and experimental testing, only 9105 sequences of length 20 
at Hamming distance at least 5 were found to be free of 
secondary structure at temperatures of 61±5°C. Since at lower 
ambient temperatures the probability of secondary structure 
formation is even higher, it is clear that the effective number 
of codewords useful for DNA computing is extremely small. 

In this paper, we investigate properties of DNA sequences 
that may lead to undesirable folding. Our approach is based on 
analysis of a well-known algorithm for approximately deter- 
mining DNA secondary structure, called Nussinov's method. 
This analysis allows us to extract some design criteria that 
yield DNA sequences that are unlikely to fold undesirably. 
These criteria reduce to the requirement that the first few 
shifts of a DNA codeword have the property that they do 
not contain Watson-Crick complementary matchings with the 
original sequence. We consider the enumeration of sequences 
having the shift property and provide some simple construction 
strategies which meet the required restrictions. To the best of 
our knowledge, this is the first attempt in the literature aimed at 
providing a rigorous setting that links DNA folding properties 
to constraints on the primary structure of the sequences. 



II. DNA Secondary Structure: Properties and 
Code Design Issues 

DNA of higher species consists of two complementary 
chains twisted around each other to form a double helix. Each 
chain is a linear sequence of nucleotides, or bases — two 
purines, adenine (A) and guanine (G), and two pyrimidines, 
thymine (T) and cytosine (C). The purine bases and pyrimin- 
dine bases are Watson-Crick (WC) complements of each other, 
in the sense that 



bulge loop 



G = C, C 



(1) 



More specifically, in a double helix, the base A pairs with 
T by means of two hydrogen bonds, while C pairs with G 
by means of three hydrogen bonds (i.e. the strength of the 
former bond is weaker than the strength of the latter). For 
DNA computing purposes, one is only concerned with single- 
stranded (henceforth, oligonucleotide) DNA sequences. 

Oligonucleotide DNA sequences are formed by heating 
DNA double helices to denaturation temperatures, at which 
they break down into single strands. If the temperature is 
subsequently reduced, oligonucleotide strands with large re- 
gions of sequence complementarity can bind back together in 
a process called hybridization. Hybridization is assumed to 
occur only between complementary base pairs, and lies at the 
core of DNA computing. 

As a first approximation, oligonucleotide DNA sequences 
can be simply viewed as words over a four-letter alphabet Q = 
{A, C, G, T}, with a prescribed set of complex properties. The 
generic notation for such sequences will be q = q\qi . . .q n , 
with n indicating the length of the sequences. The WC 
complement q of a DNA sequence is defined as qTTfr ■ ■ - (fa, 
qi being the WC complement of ~ql as given by Q. 

The secondary structure of a DNA codeword q\q2 ■ ■ ■ q n is 
a set, S, of disjoint pairings between complementary bases 
(liiQj) w i m i < J. A secondary structure is formed by 
a chemically active oligonucleotide sequence folding back 
onto itself due to self-hybridization, i.e., hybridization between 
complementary base pairs belonging to the same sequence. As 
a consequence of the bending, elaborate spatial structures are 
formed, the most important components of which are loops 
(including branching, internal, hairpin and bulge loops), stem 
helical regions, as well as unstructured single strands. Figure^ 
illustrates these concepts for an RNA strand 1 . It was shown 
experimentally that the most important factors influencing the 
secondary structure of a DNA sequence are the number of base 
pairs in stem regions, the number of base pairs in a hairpin 
loop region as well as the number of unpaired bases. 

Determining the exact pairings in a secondary structure 
of a DNA sequence is a complicated task, as we shall try 
to explain briefly. For a system of interacting entities, one 
measure commonly used for assessing the system's property 
is the free energy. The stability and form of a secondary 
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Fig. 1. DNA/RNA secondary structure model (reprinted from [9]). 



configuration is usually governed by this energy, the general 
rule-of-thumb being that a secondary structure minimizes the 
free energy associated with a DNA sequence. The free energy 
of a secondary structure is determined by the energy of its 
constituent pairings. Now, the energy of a pairing depends on 
the bases involved in the pairing as well as all bases adjacent 
to it. Adding complication is the fact that in the presence of 
other neighboring pairings, these energies change according to 
some nontrivial rules. 

Nevertheless, some simple dynamic programming tech- 
niques can be used to approximately determine the secondary 
structure of a DNA sequence. Such approximations usually 
have the correct form in 70% of the cases considered. Among 
these techniques, Nussinov's folding algorithm is the most 
widely used scheme [10]. Nussinov's algorithm is based on 
the assumption that in a DNA sequence c\Ci . . . c n , the energy 
between a pair of bases, a(ci,Cj), is independent of all 
other pairs. For simplicity of exposition, we shall assume 
that a(ci,Cj) = -1 if {ci,Cj} = {A,T}, a(ci,Cj) = -2 
if {ci,Cj} = {G, C}, and a{c%,Cj) = otherwise 2 . Let Eij 
denote the minimum free energy of the subsequence c,, . . . , Cj. 
The independence assumption allows us to compute the min- 
imum free energy of the sequence c\ , C2 , . . . , c„ through the 
recursion 



Ei j = min 



i+lj-l +ot(c i ,c j ), 
E iik -! +E k<j , i<k<j, 

where En = for i — 1,2,..., n and En-\ — for i = 
2,...,n. The value of E\, n is the minimum free energy of 
a secondary structure of ci, C2, c n . Note that E\_ n < 0. 
A very large negative value for the free energy E\ n of a 
sequence is a good indicator of the presence of stacked base 
pairs and loops, i.e., a secondary structure, in the physical 
DNA sequence. 

Nussinov's algorithm can be described in terms of free- 
energy tables, two of which are shown below. We first describe 
how such a table is filled out, after which we will point out 
some important properties of such tables. In a free-energy 



'Oligonucleotide DNA sequences are structurally very similar to RNA 
sequences, which are by their very nature single-stranded, and consist of the 
same bases as DNA strands, except for thymine being replaced by uracil (U). 



Experimentally obtained interaction energies, depending on the choice of 
the base pair, can be easily incorporated into the model instead of the constants 
-1 and -2. 
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TABLE I 

Free-energy table for the sequence GCGCCCCGC 
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TABLE II 

Free-energy table for the sequence GAGGGTTTT 



table, the entry at position (the top left position being 

(1,1)), contains the value of Eij. The table is filled out by 
initializing the entries on the main diagonal and on the first 
lower sub-diagonal of the matrix to zero, and calculating the 
energy levels according to the recursion in (0. The calcula- 
tions proceed successively through the upper diagonals: entries 
at positions (1, 2), (2, 3), (n — l,n) are calculated first, 
followed by entries at positions (1, 3), (2, 4), (n — 2,n), 
and so on. Note that the entry at > i, depends 

on a(i,j) and the entries at I = — 1, 

I = i + 1, . . . , n — 1, and (i + 1). 

The minimum-energy secondary structure itself can be 
found by the backtracking algorithm [10] which retraces the 
steps of Nussinov's algorithm. The secondary structures for the 
sequences in Tables HI and ITT1 shown in Figures [2] and [5] have 
been found using the Vienna RNA/DNA secondary structure 
package [12], which is based on the Nussinov algorithm, but 
which uses more accurate values for the parameters a(ct 1 Cj), 
as well as more sophisticated prediction methods for base 
pairing probabilities. 

Tables U and [H] show that the minimum free energy for the 
sequence GCGCCCCGC is —6, while that for the sequence 
GAGGGTTTT is -l. 3 This fact alone indicates that the 

3 Observe that although the free energy in the second case is — 1, the 
sequence is deemed to have no secondary structure; this is due to the fact 
that the one possible complementary base pairing, namely that of A and T, 
forms too weak a bond for the resultant structure to be stable. 
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Fig. 2. Secondary structure of the sequence GCGCCCCGC. 



number of paired bases in the first sequence ought to be larger 
than in the second one, and hence the former is more likely 
to have a secondary structure than the latter. 

More generally, one can observe the following characteris- 
tics of free-energy tables: if the first upper diagonal contains a 
large number of —1 or —2 entries, then some of these entries 
"percolate" through to the second upper diagonal, where they 
get possibly increased by —1 or —2 if complementary base 
pairs are present at positions i and i + 2, 1 < i < n — 2 
in the DNA sequence. The values on the second diagonal, in 
turn, percolate through to the third diagonal, and so on. Hence, 
the free energy of the DNA sequence depends strongly on the 
number of non-zero values present on the first diagonal. This 
phenomenon was also observed experimentally in [2], where 
the free energy was modelled by a function of the form 

n-l 

Ex, n = K + ^2 a(ci,c i+1 ), (3) 

i=l 

with k denoting a correction factor which depends on the 
number of G and C bases in the sequence c. The stability of 
a secondary structure, as well as its melting properties can be 
directly inferred from (|3}- Note that under the assumption that 
a = — 1 for all pairings, the absolute value of the sum in Q 
is simply the total number of pairings of complementary bases 
between the DNA codeword c and the codeword shifted one 
position to the right or equivalently, the sum of the entries 
in the first upper diagonal of the free-energy table. These 
observations imply that a more accurate model for the free 
energy should be of the form 

n-l 

Ei, n = « + 7i y"l Ct(a,C i+1 ) 

i=l 

Tl—2 n — l 

+ 72 X! "(ci, c 4+2 ) H J~] a(a,c l+ i), 

1=1 i=l 

for a correction factor k and some non-zero weighting factors 
7i > 72 > • • ■ > 7j. Furthermore, the same observation 
implies that from the stand-point of designing DNA codewords 
without secondary structure, it is desirable to have codewords 
for which the respective sums of the elements on the first 
several diagonals are either all zero or of some very small 
absolute value. This requirement can be rephrased in terms 
of requiring a DNA sequence to satisfy a shift property, in 
which a sequence and its first few shifts have few or no 
complementary base pairs at the same positions. 




Fig. 3. Absence of secondary structure in the sequence GAGGGTTTT. 



III. DNA CODEWORDS SATISFYING A SHIFT PROPERTY 

In this section, we consider the enumeration and construc- 
tion of DNA sequences satisfying certain shift properties, 
which we shall define rigorously. 

A. Sequence Enumeration 

Definition 3.1: The Watson-Crick (WC) distance between 
two DNA sequences p = qiq2 . . . q n and q = P1P2 ■ ■ ■ p n is 
defined as 

d W c(p,<l) = \{i '-Pi qi}\, (4) 

i.e., dwc '(Pil) = <£ff(p,q), where djj stands for the standard 
Hamming distance. 

Given a DNA codeword q, we shall denote by j-i, 1 < 
i < j < n, the subsequence qiqi+i . . .q -. For < i < n — 1, 
we define 



(5) 



In other words, /Ltj(q) is the number of indices I 6 
{1,2, ... ,n — {} such that qe = qj+i- A shift property of 
q is now simply any sort of restriction imposed on /ii(q). 

Given s > 1, let g s (n) denote the number of sequences, q, 
of length n for which jUj(q) = 0, i = 1, s. For n < s, we 
take a (ra) to be g n -i(n). 

Lemma 3.1: For all n > 1, g n -i(n) = 4(2™ — 1). 

Proof: It is clear that a DNA sequence is counted by 
<?n-i(") iff it contains no pair of complementary bases. Such 
a sequence must be over one of the alphabets {A, G}, {A, C}, 
{T, G} and {T, C}. There are 4(2™ — 1) such sequences, since 
there are 2™ sequences over each of these alphabets, of which 
A™, T", G" and C" are each counted twice. ■ 

Lemma 3.2: For all n > s, 

g s {n) = 2g s (n - 1) + g s (n - s). 

Proof: Let Q s (n) denote the set of all sequences q 
of length n for which /Xj(q) = 0, i = l,...,s. Thus, 
\G s (n)\ = g s (n). Note that for any q E Q s {n), q [n _ Si „] 
cannot contain a complementary pair of bases, and hence 
cannot contain three distinct bases. Let £(ri) denote the set 
of sequences qi...q n E Q s {n) such that q n - s+1 = q n - s +2 = 
••• = Qn> an d l et U(n) — G s ( n ) \ £(n). We thus have 
\£ (n) \ + \U(n)\ = g s (n). Each sequence in £{n) is obtained 
from some sequence qi...q n _ s+ i E G a (n— s+1) by appending 
s — 1 bases, q n - s+ %, . . . , q n , all equal to q n _ s+ i. Hence, 
1^(^)1 = \Gs(n — s + 1)| = g s (n — s + 1), and therefore, 
\U(ri)\ = g s (n) ~ g s (n - s + 1). 



Now, observe that each sequence qiq2.-.q n E Gs{n) is 
obtained by appending a single base, q n , to some sequence 
9132- --qn-i G G s (n - 1). If q\qi ...q n -i is in fact in 
£ (n — 1), then there are three choices for q n . Otherwise, if 
<7i<72 • ■ ■ <Zn-i S U(n — 1), there are only two possible choices 
for q n . Hence, 

g s (n) = 3\£(n- 1)| +2\U(n- 1)| 

= 3g s {n - s) + 2 (g s {n - 1) - g s (n - s)) 

This proves the claimed result. ■ 
From Lemmas 13 . 1 1 and [3~2l we obtain the following result. 
Theorem 3.3: The generating function G s (z) = 

J2T=i 9s{n)z~ n is given by 

~s-l i -z—2 



G a {z)=A 



z z -" + 



z + l 



2z s 



1 



It can be shown that for s > 1, the polynomial ip s (z) = 
z s — 2z s ~ 1 — 1 in the denominator of G s (z) has a real root, p s , 
in the interval (2,3), and s— 1 other roots within the unit circle. 
It follows that g s (n) ~ (3 s (p s ) n for some constant (3 S > 0. 
It is easily seen that p 8 decreases as s increases, and that 
linis^oo p s = 2. 

Theorem 3.4: The number of length-n DNA sequences q 
such that /Lii(q) = m, is 4(™- 1 )3»-™-i. 

Proof: Let B{n,m) be the set of length-n DNA se- 
quences q such that /ii(q) = to. A sequence q = q\q% . . . q n 
is in B{n, to) iff the set I = {i : q^ = qi-i} has cardinality 
m. So, to construct such a sequence, we first arbitrarily pick 
a qi and an J C {2, 3, ... , n}, \I\ — to, which can be done 
in 4(™~ 1 ) ways. The rest of q is constructed recursively: for 
i > 2, set qi — qJZi if i € I, and pick a qi ^ if i ^ /. 
Thus, there are 3 choices for each i > 2, i ^ I, and hence a 
total of 4(" m 1 )3 n - m - 1 sequences q in B{n, to). ■ 

The enumeration of DNA sequences satisfying any sort of 
shift property becomes considerably more difficult if we bring 
in the additional requirement of constant GC-content. 

Definition 3.2: The GC-content, icgc(q), of a DNA se- 
quence q = qiQ2 . . . q„ is defined to be the number of indices 
% such that qi E {G,C}. 

A DNA code in which all codewords have the same GC- 
content, w, is called a constant GC-content code. The constant 
GC-content constraint is introduced in order to achieve par- 
allelized operations on DNA sequences, by assuring similar 
thermodynamic characteristics of all codewords. The GC- 
content usually needs to be in the range of 30 — 50% of the 
length of the code. 

The following result can be proved by applying the powerful 
Goulden-Jackson method of combinatorial enumeration [4, 
Section 2.8]. 

Theorem 3.5: The number of DNA sequences q of length 
n and GC-content w, such that pi(q) = 0, is given by the 
coefficient of x n y w in the (formal) power series expansion of 

t ' r ' I'll \ 

§{x,y) = 1- 



2xy 



1 



1 



xy 



B. Code Construction 

The problem of constructing DNA codewords obeying some 
form of a shift constraint can be reduced to a binary code 
design problem by mapping the DNA alphabet onto the set of 
length-two binary words as follows: 

A ^00, T->01, C->10, G-fll. (6) 

Let q be a DNA sequence. The sequence b(q) obtained by 
applying coordinatewise the mapping given in (|6) to q will be 
called the binary image of q. If &(q) = 60^1^2 • • • &2n-i> then 
the subsequence e(q) = &0&2 ■ ■ • &2n-2 will be referred to as 
the even subsequence of 6(q), and o(q) = &1&3 . . . &2n-i will 
be called the odd subsequence of 6(q). 

The WC distance, dyvc(p, r )> between two DNA words p, r 
can be expressed in terms of the even and odd subsequences, 
as stated in the lemma below. For notational ease, given binary 
words x = (xi) and y = (?/,), we define x © y = (xi + yi), 
the sum being taken modulo-2, and x * y = (xiyi). 

Lemma 3.6: Let p and r be two words of length n over the 
alphabet Q, and define <j e = e(p) © e(r), a a = o(p) © o(r). 
Using x to denote the complement of a binary sequence x, 
we can express the WC distance between p and r as 

d W c{p,r) = n- w H {o7* v ) 

where wh{ ) denotes Hamming weight. Consequently, if for 
some length-n DNA sequence q, we take p = qn in _ji and 
r = q[ J+1 „i, then q satisfies the i-th shift constraint Hi(q) = 
iff 

w H {a e * a ) = 0. (7) 



In a companion paper [8], we described a sample of 
construction methods for DNA codes which reduce undesired 
hybridization and alow for fully parallel DNA system oper- 
ation. Among the constraints identified for this problem are 
the base runlength constraint, the constant GC-content con- 
straint, and the Hamming and reverse-complement Hamming 
distance constraint. We will show next that it is straightforward 
to incorporate these hybridization constraints into a scheme 
which also allows for constructing sequences with reduced 
probability of secondary structure formation. The idea is based 
on the use of the non-zero codewords of cyclic simplex codes 
[5, Chapter 8] or subsets thereof. Recall that a cyclic simplex 
code of dimension m is a simplex code of length n = 2 m — 1 
composed of the all-zeros codeword and the n distinct cyclic 
shifts of any non-zero codeword. 

Theorem 3.7: Let C be a DNA code obtained by choosing 
the set of non-zero codewords of a cyclic simplex code of 
length n = 2 m — 1 for both the even and odd binary code 
component. Such a code contains (2 m — l) 2 codewords with 
the property that for all i € {1, 2, . . . , n — 1} and q G C, 

^(q)<2™- 2 . 



Proof: It is straightforward to see that for any q G C, 
if we let p = qri. n _i], r = q[i+i,n]> tnen a e an d a o, defined 
as in Lemma l3~6l are just truncations of codewords from the 
simplex code. Since the simplex code is a constant-weight 
code, with minimum distance 2 m_2 , each pair of codewords 
intersects in exactly 2 m ~ 2 positions. This implies that there 
exist exactly 2 m ~ 2 positions for which one given codeword 
contains all zeros, and the other codeword contains all ones. 
These are the positions that are counted in wn^e, c ), which 
proves the claimed result. ■ 
Example 3.8: Consider the previous construction for m = 
3, and a generating codeword 1110100. There are 49 DNA 
codewords of length 7 obtained based on the outlined method. 
These codewords have minimum Hamming distance equal to 
four, and they also have constant GC content w = 4. A 
selected subset of codewords from this code is listed below. 

TGGCTCA, TCCGTGA, CACGGTC, TAGCCTG, 
CATGGCT, GATCCGT, GGGAGAA, GGAGAAG. 

The last two codewords consist of the bases G and A only, 
and clearly satisfy the shift property with /itj = for all 
i < 7. On the other hand, for the first three codewords one 
has /ii = 1, while for the next three codewords it holds that 
/ii = 2 (meeting the upper bound in the theorem). Due to the 
cyclic nature of the generating code, one can easily generate 
the Nussinov folding table for all the codewords [8]. Such an 
evaluation, as well as the use of the program package Vienna, 
show that none of the 49 codewords exhibits a secondary 
structure. The largest known DNA codes with the parameters 
specified above consists of 72 codewords [3]. This code is 
generated by a simulated annealing process which does not 
allow for simple secondary structure testing. 

References 

[1] L.M. Adleman, "Molecular Computation of Solutions to Combinatorial 
Problems," Science, vol. 266, pp. 1021-1024, Nov. 1994. 

[2] K. Breslauer, R. Frank, H. Blocker, and L. Marky, "Predicting DNA 
Duplex Stability from the Base Sequence," Proceedings of the National 
Academy of Science, USA 83, pp. 3746-3750, 1986. 

[3] P. Gaborit, and H. King, "Linear constructions for DNA codes," preprint. 

[4] LP. Goulden and D.M. Jackson, Combinatorial Enumeration, Dover, 
2004. 

[5] J.I. Hall, Lecture notes on error-control coding, available online at 
http : / /www . mth . msu . edu/~ jhall / . 

[6] FJ. MacWilliams, and N.J. A. Sloane, 77ie Theory of Error Correcting 
Codes, North-Holland, 1977. 

[7] M. Mansuripur, PK. Khulbe, S.M. Kuebler, J.W. Perry, M.S. Giridhar 
and N. Peyghambarian, "Information Storage and Retrieval using Macro- 
molecules as Storage Media," University of Arizona Technical Report, 
2003. 

[8] O. Milenkovic and N. Kashyap, "New Constructions of Codes for DNA 

Computing," accepted for presentation at WCC 2005, Bergen, Norway. 
[9] S. Mneimneh, "Computational Biology Lecture 20: RNA secondary 

structures," available online at engr.5mu.edu/~saad/c0urses/ 

cse8354/lectures/lecture20. pdf . 
[10] R. Nussinov, G. Pieczenik, J.R. Griggs and D.J. Kleitman, "Algorithms 

for loop marchings," SIAM J. Appl. Math., vol. 35, no. 1, pp. 68-82, 

1978. 

[11] S. Tsaftaris, A. Katsaggelos, T. Pappas and E. Papoutsakis, "DNA Com- 
puting from a Signal Processing Viewpoint," IEEE Signal Processing 
Magazine, pp. 100-106, Sept. 2004. 

[12] The Vienna RNA Secondary Structure Package, 
http: //rna.tbi . univie.ac. at /cgi-bin/RNAf old . cgi. 



